<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic
  PUBLIC "-//OASIS//DTD DITA Composite//EN" "ditabase.dtd">
<topic id="topic1">
  <title id="kx20941">gpmapreduce</title>
  <!-- This file is the target of a conref from client_tool_guides/client/unix/gpmapreduce.xml. -->
  <body>
    <p>Runs Greenplum MapReduce jobs as defined in a YAML specification document. </p>
    <section id="section2">
      <title>Synopsis</title>
      <codeblock>gpmapreduce -f &lt;config.yaml> [dbname [&lt;username>]] 
     [-k &lt;name=value> | --key &lt;name=value>] 
     [-h &lt;hostname> | --host &lt;hostname>] [-p &lt;port>| --port &lt;port>] 
     [-U &lt;username> | --username &lt;username>] [-W] [-v]

gpmapreduce -x | --explain 

gpmapreduce -X | --explain-analyze

gpmapreduce -V | --version 

gpmapreduce -h | --help </codeblock>
    </section>
    <section id="section3">
      <title>Requirements</title>
      <p>The following are required prior to running this program:</p>
      <ul id="ul_v5h_bxq_dp">
        <li id="kx142094">You must have your MapReduce job defined in a YAML file. See <xref
            href="gpmapreduce-yaml.xml#topic1" type="topic" format="dita"/> for more information
          about the format of, and keywords supported in, the Greenplum MapReduce YAML configuration
          file.</li>
        <li id="kx141300">You must be a Greenplum Database superuser to run MapReduce jobs written
          in untrusted Perl or Python.</li>
        <li id="kx141310">You must be a Greenplum Database superuser to run MapReduce jobs with
            <codeph>EXEC</codeph> and <codeph>FILE</codeph> inputs.</li>
        <li id="kx141334">You must be a Greenplum Database superuser to run MapReduce jobs with
            <codeph>GPFDIST</codeph> input unless the user has the appropriate rights granted.
        </li>
      </ul>
    </section>
    <section id="section4">
      <title>Description</title>
      <p><xref href="https://en.wikipedia.org/wiki/MapReduce" scope="external" format="html"
          >MapReduce</xref> is a programming model developed by Google for processing and generating
        large data sets on an array of commodity servers. Greenplum MapReduce allows programmers who
        are familiar with the MapReduce paradigm to write map and reduce functions and submit them
        to the Greenplum Database parallel engine for processing.</p>
      <p><codeph>gpmapreduce</codeph> is the Greenplum MapReduce program. You configure a
        Greenplum MapReduce job via a YAML-formatted configuration file that you pass to
        the program for execution by the Greenplum Database parallel engine. The
        Greenplum Database system distributes the input data, runs the program
        across a set of machines, handles machine failures, and manages the required
        inter-machine communication.</p>
    </section>
    <section id="section5">
      <title>Options</title>
      <parml>
        <plentry>
          <pt>-f <varname>config.yaml</varname></pt>
          <pd>Required. The YAML file that contains the Greenplum MapReduce job definitions. Refer
            to <xref href="gpmapreduce-yaml.xml#topic1" type="topic" format="dita"/> for the format
            and content of the parameters that you specify in this file.</pd>
        </plentry>
        <plentry>
          <pt>-? | --help</pt>
          <pd>Show help, then exit.</pd>
        </plentry>
        <plentry>
          <pt>-V | --version</pt>
          <pd>Show version information, then exit.</pd>
        </plentry>
        <plentry>
          <pt>-v | --verbose</pt>
          <pd>Show verbose output.</pd>
        </plentry>
        <plentry>
          <pt>-x | --explain</pt>
          <pd>Do not run MapReduce jobs, but produce explain plans.</pd>
        </plentry>
        <plentry>
          <pt>-X | --explain-analyze</pt>
          <pd>Run MapReduce jobs and produce explain-analyze plans.</pd>
        </plentry>
        <plentry>
          <pt>-k | --keyname=<varname>value</varname></pt>
          <pd>Sets a YAML variable. A value is required. Defaults to "key" if no variable name is
            specified. </pd>
        </plentry>
      </parml>
      <sectiondiv id="section6">
        <b>Connection Options</b>
        <parml>
          <plentry>
            <pt>-h <varname>host</varname> | --host <varname>host</varname></pt>
            <pd>Specifies the host name of the machine on which the Greenplum master database server
              is running. If not specified, reads from the environment variable
                <codeph>PGHOST</codeph> or defaults to localhost.</pd>
          </plentry>
          <plentry>
            <pt>-p <varname>port</varname> | --port <varname>port</varname></pt>
            <pd>Specifies the TCP port on which the Greenplum master database server is listening
              for connections. If not specified, reads from the environment variable
                <codeph>PGPORT</codeph> or defaults to 5432.</pd>
          </plentry>
          <plentry>
            <pt>-U <varname>username</varname> | --username <varname>username</varname></pt>
            <pd>The database role name to connect as. If not specified, reads from the environment
              variable <codeph>PGUSER</codeph> or defaults to the current system user name.</pd>
          </plentry>
          <plentry>
            <pt>-W | --password</pt>
            <pd>Force a password prompt.</pd>
          </plentry>
        </parml>
      </sectiondiv>
    </section>
    <section id="section7">
      <title>Examples</title>
      <p>Run a MapReduce job as defined in <codeph>my_mrjob.yaml</codeph> and connect to the database
          <codeph>mydatabase</codeph>:</p>
      <codeblock>gpmapreduce -f my_mrjob.yaml mydatabase</codeblock>
    </section>
    <section id="section8">
      <title>See Also</title>
      <p><xref href="gpmapreduce-yaml.xml#topic1" type="topic" format="dita"/></p>
    </section>
  </body>
</topic>
